Parsing the Wall Street Journal with the Inside-Outside Algorithm

نویسندگان

  • Yves Schabes
  • Michal Roth
  • Randy Osborne
چکیده

We report grammar inference experiments on partially parsed sentences taken from the Wall Street Journal corpus using the inside-outside algorithm for stochastic context-free grammars. The initial grammar for the inference process makes no ,assumption of the kinds of structures and their distributions. The inferred grammar is evaluated by its predicting power and by comparing the bracketing of held out sentences imposed by the inferred grammar with the partial bracketings of these sentences given in the corpus. Using part-of-speech tags as the only source of lexical information, high bracketing accuracy is achieved even with a small subset of the available training material (1045 sentences): 94.4% for test sentences shorter than 10 words and 90.2% for sentences shorter than 15 words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Re-estimation of Lexical Parameters for Treebank PCFGs

We present procedures which pool lexical information estimated from unlabeled data via the Inside-Outside algorithm, with lexical information from a treebank PCFG. The procedures produce substantial improvements (up to 31.6% error reduction) on the task of determining subcategorization frames of novel verbs, relative to a smoothed Penn Treebank-trained PCFG. Even with relatively small quantitie...

متن کامل

Viterbi Training Improves Unsupervised Dependency Parsing

We show that Viterbi (or “hard”) EM is well-suited to unsupervised grammar induction. It is more accurate than standard inside-outside re-estimation (classic EM), significantly faster, and simpler. Our experiments with Klein and Manning’s Dependency Model with Valence (DMV) attain state-of-the-art performance — 44.8% accuracy on Section 23 (all sentences) of the Wall Street Journal corpus — wit...

متن کامل

Investigation of chemical adsorption of CO, CO2, [12 and NO molecules on inside and outside of single-wall nanotube using HF and DET calculations

In this research. CO gas molthules were approached to single-wall carbon nanotube (SWNT) and (6,0) CNTsurface from carbon side and oxygen side in three states (top, bridge, centre) and two shapes ( erlica I.horizontal), then adsorption energies were calculated by B3TYP/6-310 B3LYPI3-216" and Hge3-210"methods after that they were compared m order to obtain the most stable adsorption state. OFT a...

متن کامل

Applying Co-Training Methods to Statistical Parsing

We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algorithm iteratively labels the entire data set with parse trees. Using empirical results based on parsing the Wall Street...

متن کامل

Unsupervised Learning of Hierarchical Dependency Parsing

In a sentence, a dependency is a link from one word, called a dependent or attachment, to another, the head. If we imagine that the global head word itself connects to some root note, then a sentence of length n corresponds to a parse tree with exactly n links. Constituency parses, such as those in the Wall Street Journal section of the Penn Tree-Bank, can be converted to dependency parses usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993